1 Mount Hood Environmental, PO Box 1303, Challis, Idaho, 83226, USA
2 Mount Hood Environmental, 39085 Pioneer Boulevard #100 Mezzanine, Sandy, Oregon, 97055, USA
3 Mount Hood Environmental, PO Box 4282, McCall, Idaho, 83638, USA

Correspondence: Bryce N. Oldemeyer <>, Mark Roes <>

1 Background

Quantile random forest (QRF) models have become popular for quantifying freshwater habitat carrying capacity due to their flexible framework that avoids common pitfalls associated with noisy data, correlated variables, and non-linear relationships. Recently, three QRF models were fit with fish-habitat data from fish observation studies and the Columbia Habitat Monitoring Program (CHaMP) and used to estimate habitat carrying capacity for ESA-listed populations of Chinook salmon and steelhead during three critical life-stages (juvenile summer parr, juvenile winter presmolt, and adult redds) for wadable streams within the Columbia River Basin. Model covariates were selected from >100 habitat metrics and chosen for their high predictive power (Appendix B of Idaho OSC Team, 2019; See et al. 2021). Since then, additional emphasis has been placed on the utility of the QRF capacity models to inform restoration project design and monitoring and increase the spatial extent of fish-habitat data using streamlined protocols (DASH - Carmichael et al. 2019). Therefore, we conducted a revised covariate selection process for the QRF models that prioritized 1) predictive power, 2) compatability with future DASH data collection, 3) informing restoration project development and monitoring, 4) minimal imputation for missing CHaMP data, and 5) low covariate correlation. Additionally, we evaluated the assumption made during initial QRF model development that a single model was appropriate for both Chinook salmon and steelhead during each of the three life stages.

Similarly, a random forest (RF) extrapolation model was used to predict habitat capacity across larger spatial scales where CHaMP and/or DASH data weren’t available (Appendix B of Idaho OSC Team). We revisited the globally available attributes (GAAs) included in the original RF extrapolation model and made minor modifications to the model that maintained covariates with high predictive power and included metrics that better aligned with the revised QRF model. To evaluate the differences between the original and revised QRF/RF models, we compared watershed carrying capacity estimates produced by the both sets of models for eight watersheds located within the Upper Salmon River basin.

This process resulted in revised QRF and RF extrapolation models that were more informative for restoration design and monitoring, included covariates that could be calculated using newly developed stream habitat protocols, and maintained a similar level of predictive power as the original models.

2 Revised QRF Habitat Capacity Model

2.1 Covariate selection process

Habitat covariates for the QRF habitat capacity models were generated from the CHaMP dataset or obtained from other publicly available sources (e.g. NorWest stream temperature data). In total, 129 habitat metrics were examined in the selection process. Covariates were aggregated into eleven metric categories and 1-4 covariates were chosen from each category based on following criteria:

  1. What was the strength between the covariate and the response variable (based on MIC score)?

  2. Could the covariate be calculated using DASH data?

  3. Was the covariate informative for restoration efforts?

  4. How much data were missing and/or the amount of “0”s for the covariate in the fish-habitat dataset?

  5. How correlated was the covariate with other covariates within the same metric category, particularly covariates with higher MIC scores?

Below is a simplified, theoretical example of how a covariate might be selected for a model.

In the original QRF model, discharge was included as a covariate because it had a high MIC score and it made biological sense (i.e. discharge is a significant factor impacting fish habitat use and, presumably, habitat carrying capacity). Unfortunately, discharge isn’t that informative for restoration efforts because most restoration actions can’t create water. Discharge, like many habitat metrics, is highly correlated with other potential covariates which may have been left out of the original QRF model for any number of reasons (highly correlated with other model covariates, excluded to avoid overfitting, etc.). Using the revised model selection criteria, we observed that average thalweg depth has a MIC score nearly as high as discharge, is informative for restoration efforts, can be calculated from DASH, and is highly highly correlated with discharge. Based on all the information above, mean thalweg depth would be substituted for discharge in the model.

The covariate selection process was conducted independently for both species for all three life stages to test the assumption made during the original QRF model development that it was appropriate to apply the same life stage models to both species.

2.2 Covariate selection results

There were 12-14 covariates selected for each of the six QRF habitat capacity models. While the relative importance of the final covariates in the three life stage models differed between species, the final covariates themselves were nearly identical. (Figure 2.1 , Figure 2.2, and Figure 2.3 ). This confirmed that one model for both species per life stage was appropriate. Therefore, we consolidated the species-specific models into a single winter juvenile, summer juvenile, and redd models. (Table 2.1). Examination of covariate partial dependence plots from the revised QRF habitat capacity models indicated effects that were generally biologically intuitive and can be found in Section @ref(revised-qrf-habitat-capacity-model—partial-dependence-plots)

Relative importance plots for covariates included in the revised juvenile summer QRF models

Figure 2.1: Relative importance plots for covariates included in the revised juvenile summer QRF models

Relative importance plots for covariates included in the revised juvenile winter QRF models

Figure 2.2: Relative importance plots for covariates included in the revised juvenile winter QRF models

Relative importance plots for covariates included in the revised QRF redds models

Figure 2.3: Relative importance plots for covariates included in the revised QRF redds models

Table 2.1: Habitat covariates and their descriptions for three revised life stage QRF capacity models. Numbers indicate where each metric ranked in relative importance for each species. Dashes indicate the metric was not used for a given model.
Name Metric Category Juv Sum Chnk Juv Sum Sthd Juv Win Chnk Juv Win Sthd Redds Chnk Redds Sthd Description
Channel Unit Frequency ChannelUnit 5 11 3 2 1 1 Number of channel units per 100 meters.
Fast NonTurbulent Frequency ChannelUnit 9 13 13 6 Number of Fast Water Non-Turbulent channel units per 100 meters.
Fast Turbulent Frequency ChannelUnit 3 6 4 2 Number of Fast Water Turbulent channel units per 100 meters.
Sinuosity Complexity 13 7 6 5 10 11 Ratio of the thalweg length to the straight line distance between the start and end points of the thalweg.
Wetted Channel Braidedness Complexity 14 14 10 11 Ratio of the total length of the wetted mainstem channel plus side channels and the length of the mainstem channel.
Fish Cover: LW Cover 4 6 Percent of wetted area that has woody debris as fish cover.
Fish Cover: Some Cover Cover 7 3 11 8 9 4 Percent of wetted area with some form of fish cover
Residual Depth Size 2 3 Average residual depth of the channel unit.
Average Thalweg Depth Size 1 2 2 3 Average Thalweg Depth, meters
Thalweg Exit Depth Avg Size 5 4 Depth of the thalweg at the downstream edge of the channel unit.
Residual Pool Depth Size 12 10 11 5 The average difference between the maximum depth and downstream end depth of all Slow Water/Pool channel units.
Discharge Size 1 1 The sum of station discharge across all stations. Station discharge is calculated as depth x velocity x station increment for all stations except first and last. Station discharge for first and last station is 0.5 x station width x depth x velocity.
Substrate Est: Boulders Substrate 8 9 6 12 Percent of boulders (256-4000 mm) within the wetted site area.
Substrate Est: Cobble and Boulder Substrate 7 10 Total cobble plus boulder percentage
Substrate Est: Cobbles Substrate 11 5 8 8 Percent of cobbles (64-256 mm) within the wetted site area.
Substrate Est: Coarse and Fine Gravel Substrate 6 8 8 9 5 13 Percent of coarse and fine gravel (2-64 mm) within the wetted site area.
Substrate Est: Sand and Fines Substrate 10 4 9 7 7 7 Percent of sand and fine sediment (0.01-2 mm) within the wetted site area.
Avg. August Temperature Temperature 2 1 3 10 Average predicted daily August temperature from NorWest, averaged across the years 2002-2011.
Large Wood Frequency: Wetted Wood 4 12 12 9 Number of large wood pieces per 100 meters within the wetted channel.

3 Revised RF Extrapolation Model

The spatial extent of QRF capacity predictions is limited to reaches with high-resolution habitat data (i.e. CHaMP or DASH data). To estimate capacity outside of the QRF habitat capacity spatial extent, an extrapolation model fit to “globally available attributes” (GAAs) obtained from a continuous, linear stream network created by Morgan Bond and Tyler Nodine (https://www.fisheries.noaa.gov/resource/data/columbia-basin-historical-ecology-project-data) was used for the entire Columbia River Basin. A random forest model was fit using the GAAs from the linear stream network and used to estimate habitat capacity for the entire Columbia River Basin at a 200 meter reach scale. Consistent with the QRF habitat capacity models, the RF extrapolation model makes no assumptions about the direction and distribution of effects of predictors, and constrains capacity estimates within the range of predictions produced by the QRF habitat capacity model. However, random forest methods do not account for variable strata weights across the CHaMP dataset, a source of potential bias that could be alleviated through the collection of additional paired fish and habitat data.

RF extrapolation model covariates were selected from the list of GAAs and evaluated for inclusion by examining relative importance plots (Figure 3.1, Figure 3.2, and Figure 3.3 ), partial dependence plots (Section @ref(revised-rf-extrapolation-model—partial-dependence-plots) ), and correlations between covariates. We used the previous extrapolation model as a starting point for covariate selection. This resulted in the replacement of the “regime” covariate (a categorical indicator of dominant precipitation type) for elevation and the removal of relative slope, which we found was redundant with gradient. Model results indicated that elevation was consistently one of the most important predictors in the model. This is particularly evident in the Chinook parr summer model where capacity predictions were primarily driven by elevation.

Relative importance plots for covariates included in the revised juvenile summer RF extrapolation models

Figure 3.1: Relative importance plots for covariates included in the revised juvenile summer RF extrapolation models

Relative importance plots for covariates included in the revised juvenile winter RF extrapolation models

Figure 3.2: Relative importance plots for covariates included in the revised juvenile winter RF extrapolation models

Relative importance plots for covariates included in the revised juvenile winter RF extrapolation models

Figure 3.3: Relative importance plots for covariates included in the revised juvenile winter RF extrapolation models

Table 3.1: Globally available attritibutes (GAAs) and their descriptions used in the random forest extrapolation model.
Metric Decription
Gradient % Stream gradient (%).
Sinuosity Reach sinuosity. 1 = straight, 1 < sinuous.
Alpine accumulation Number of upstream cells in alpine terrain.
Fines accumulation Number of upstream cells in fine grain lithologies.
Flow accumulation Number of upstream DEM cells flowing into reach.
Gravel accumulation Number of upstream cells in gravel producing lithologies.
Precipitation accumulation Number of upstream cells weighted by average annual precipitation.
Floodplain width Current unmodified floodplain width.
Avg Aug stream temperature Historical composite scenario representing 10 year average August mean stream temperatures for 2002-2011 (Isaak et al. 2017).
Disturbance PCA 1 Disturbance Classification PCA 1 Score (Whittier et al. 2011).
Natural PCA 1 Natural Classification PCA 1 Score (Whittier et al. 2011).
Natural PCA 2 Natural Classification PCA 2 Score (Whittier et al. 2011).

3.1 Habitat capacity estimates

Habitat carrying capacity was estimated with the revised QRF and RF extrapolation models for Chinook salmon and steelhead during juvenile summer, juvenile winter, and redd life stages for eight watersheds in the Upper Salmon River Basin. Spatial domains for species were originally defined by Streamnet (https://www.streamnet.org/home/data-maps/gis-data-sets/) and revised based on expert knowledge from regional biologists.

3.1.1 Chinook

Extrapolations of habitat capacity for Chinook salmon, by life-stage, for the eight watersheds within the Upper Salmon River Basin using the revised models.

Figure 3.4: Extrapolations of habitat capacity for Chinook salmon, by life-stage, for the eight watersheds within the Upper Salmon River Basin using the revised models.

Table 3.2: Predicted Chinook salmon habitat capacity by life-stage and watershed using the revised models.
Watershed Juv summer capacity Summer SE Juv winter capacity Winter SE Redd capacity Redd SE
EF Salmon 927,305 134,067.6 168,720 29,139.1 401 19.9
Lemhi 640,356 42,966.8 148,373 14,926.6 327 10.3
NF Salmon 267,729 35,299.6 70,998 9,629.9 164 7.0
Pahsimeroi 252,477 18,767.4 90,176 10,852.3 124 3.9
Panther Cr 930,201 95,281.6 212,396 19,946.9 448 15.4
Upper Salmon 1,070,479 171,844.7 214,157 49,611.5 571 28.3
Valley Cr 681,809 105,580.3 141,607 34,232.7 400 19.3
Yankee Fork 683,719 114,345.2 144,431 25,449.9 445 23.2
Table 3.2: Predicted Chinook salmon habitat capacity per kilometer by life-stage and watershed using the revised models.
Watershed Juv summer capacity/km Summer SE/km Juv winter capacity/km Winter SE/km Redd capacity/km Redd SE/km
EF Salmon 5,937 858.4 1,080 186.6 3 0.1
Lemhi 4,695 315.0 1,088 109.4 2 0.1
NF Salmon 5,132 676.7 1,361 184.6 3 0.1
Pahsimeroi 4,901 364.3 1,750 210.6 2 0.1
Panther Cr 6,517 667.5 1,488 139.7 3 0.1
Upper Salmon 5,539 889.2 1,108 256.7 3 0.1
Valley Cr 5,675 878.8 1,179 284.9 3 0.2
Yankee Fork 4,773 798.2 1,008 177.7 3 0.2

3.1.2 Steelhead

Extrapolations of habitat capacity for steelhead, by life-stage, for the eight watersheds within the Upper Salmon River Basin using the revised models.

Figure 3.5: Extrapolations of habitat capacity for steelhead, by life-stage, for the eight watersheds within the Upper Salmon River Basin using the revised models.

Table 3.3: Predicted steelhead habitat capacity by life-stage and watershed using the revised models.
Watershed Juv summer capacity Summer SE Juv winter capacity Winter SE Redd capacity Redd SE
EF Salmon 286,986 23,374.4 357,291 38,342 414 24
Lemhi 313,903 9,407.1 393,688 31,434 454 20
NF Salmon 218,295 18,132.8 302,875 28,489 340 24
Pahsimeroi 157,169 5,660.1 219,024 15,072 201 8
Panther Cr 253,780 12,968.4 354,740 21,533 325 16
Upper Salmon 264,394 17,563.8 331,554 40,105 460 33
Valley Cr 196,222 12,454.0 295,869 30,711 380 27
Yankee Fork 221,357 16,828.9 351,296 39,236 475 40
Table 3.3: Predicted steelhead habitat capacity per kilometer by life-stage and watershed using the revised models.
Watershed Juv summer capacity/km Summer SE/km Juv winter capacity/km Winter SE/km Redd capacity/km Redd SE/km
EF Salmon 1,733 141.2 2,158 231.5 2 0.1
Lemhi 1,793 53.7 2,249 179.6 3 0.1
NF Salmon 1,844 153.2 2,559 240.7 3 0.2
Pahsimeroi 1,894 68.2 2,639 181.6 2 0.1
Panther Cr 1,990 101.7 2,782 168.8 3 0.1
Upper Salmon 1,612 107.1 2,021 244.5 3 0.2
Valley Cr 1,633 103.6 2,462 255.6 3 0.2
Yankee Fork 1,397 106.2 2,217 247.7 3 0.3

3.2 Habitat capacity estimates compared with previous QRF and extrapolation

Comparisons of watershed capacity estimates between the previous and revised QRF and RF extrapolation models reveal modest differences in most cases, with the exception of Chinook parr summer capacities in several watersheds. The substantial increases observed for Chinook parr summer capacity were likely due to the inclusion of the elevation coviariate in the RF extrapolation model, and increases range from 13 - 222% compared to the previous extrapolation.

3.2.1 Chinook

Change in predicted Chinook salmon habitat capacity estimates from the original model and extrapolation, by life-stage, for the eight watersheds within the Upper Salmon River Basin.

Figure 3.6: Change in predicted Chinook salmon habitat capacity estimates from the original model and extrapolation, by life-stage, for the eight watersheds within the Upper Salmon River Basin.

Table 3.4: Estimated Chinook salmon capacities and comparison with previous random forest extrapolations for eight watersheds
Model Watershed Capacity per km Total capacity Capacity % change Capacity SE
Juv summer EF Salmon 5,937.2 927,305 2 134,068
Juv summer Lemhi 4,694.8 640,356 72 42,967
Juv summer NF Salmon 5,132.1 267,729 -11 35,300
Juv summer Pahsimeroi 4,900.6 252,477 38 18,767
Juv summer Panther Cr 6,516.6 930,201 -8 95,282
Juv summer Upper Salmon 5,538.9 1,070,479 -15 171,845
Juv summer Valley Cr 5,675.0 681,809 -10 105,580
Juv summer Yankee Fork 4,772.9 683,719 3 114,345
Juv winter EF Salmon 1,080.3 168,720 22 29,139
Juv winter Lemhi 1,087.8 148,373 -4 14,927
Juv winter NF Salmon 1,361.0 70,998 29 9,630
Juv winter Pahsimeroi 1,750.3 90,176 -5 10,852
Juv winter Panther Cr 1,488.0 212,396 36 19,947
Juv winter Upper Salmon 1,108.1 214,157 -8 49,612
Juv winter Valley Cr 1,178.7 141,607 8 34,233
Juv winter Yankee Fork 1,008.2 144,431 45 25,450
Redds EF Salmon 2.6 401 -13 20
Redds Lemhi 2.4 327 -3 10
Redds NF Salmon 3.1 164 -6 7
Redds Pahsimeroi 2.4 124 11 4
Redds Panther Cr 3.1 448 -4 15
Redds Upper Salmon 3.0 571 -20 28
Redds Valley Cr 3.3 400 -28 19
Redds Yankee Fork 3.1 445 -37 23

3.2.2 Steelhead

Change in steelhead habitat capacity estimates from the original model and extrapolation, by life-stage, for the eight watersheds within the Upper Salmon River Basin.

Figure 3.7: Change in steelhead habitat capacity estimates from the original model and extrapolation, by life-stage, for the eight watersheds within the Upper Salmon River Basin.

Table 3.5: Estimated steelhead capacities and comparison with previous random forest extrapolations for eight watersheds
Model Watershed Capacity per km Total capacity Capacity % change Capacity SE
Juv summer EF Salmon 1,733.1 286,986 -22 23,374
Juv summer Lemhi 1,793.2 313,903 -14 9,407
Juv summer NF Salmon 1,844.4 218,295 -15 18,133
Juv summer Pahsimeroi 1,893.6 157,169 -20 5,660
Juv summer Panther Cr 1,990.0 253,780 -13 12,968
Juv summer Upper Salmon 1,611.7 264,394 -25 17,564
Juv summer Valley Cr 1,632.9 196,222 -20 12,454
Juv summer Yankee Fork 1,397.3 221,357 -21 16,829
Juv winter EF Salmon 2,157.6 357,291 -9 38,342
Juv winter Lemhi 2,248.9 393,688 0 31,434
Juv winter NF Salmon 2,559.1 302,875 -4 28,489
Juv winter Pahsimeroi 2,638.9 219,024 2 15,072
Juv winter Panther Cr 2,781.7 354,740 13 21,533
Juv winter Upper Salmon 2,021.1 331,554 -21 40,105
Juv winter Valley Cr 2,462.1 295,869 -12 30,711
Juv winter Yankee Fork 2,217.5 351,296 -16 39,236
Redds EF Salmon 2.5 414 -13 24
Redds Lemhi 2.6 454 13 20
Redds NF Salmon 2.9 340 -5 24
Redds Pahsimeroi 2.4 201 3 8
Redds Panther Cr 2.5 325 -4 16
Redds Upper Salmon 2.8 460 -10 33
Redds Valley Cr 3.2 380 -17 27
Redds Yankee Fork 3.0 475 -21 40

4 Supplemental Figures and Tables

4.1 Partial dependence plots

To support the covariate selection process for the QRF capacity and RF extrapolation models, we generated partial dependence plots that illustrate the predicted effect of covariates on fish density and capacity. These function similarly to traditional covariate effects plots where predictions on the response are made by altering the value of the covariate of interest while all others are fixed at mean values. Because random forest models do not place any constraints on the possible mathematical relationships between predictor and response variables, effects curves have been visualized using smoothing methods (LOESS) and may not reflect actual model behavior across the range of covariate values.

4.1.1 Revised QRF habitat capacity model

Partial dependence plots for covariates included in the revised juvenile summer QRF models

Figure 4.1: Partial dependence plots for covariates included in the revised juvenile summer QRF models

Partial dependence plots for covariates included in the revised juvenile summer QRF models

Figure 4.2: Partial dependence plots for covariates included in the revised juvenile summer QRF models

Partial dependence plots for covariates included in the revised juvenile winter QRF models

Figure 4.3: Partial dependence plots for covariates included in the revised juvenile winter QRF models

Partial dependence plots for covariates included in the revised juvenile winter QRF models

Figure 4.4: Partial dependence plots for covariates included in the revised juvenile winter QRF models

Partial dependence plots for covariates included in the revised QRF redds models

Figure 4.5: Partial dependence plots for covariates included in the revised QRF redds models

Partial dependence plots for covariates included in the revised QRF redds models

Figure 4.6: Partial dependence plots for covariates included in the revised QRF redds models

4.1.2 Revised RF extrapolation model

Partial dependence plots for covariates included in the revised juvenile summer RF extrapolation models

Figure 4.7: Partial dependence plots for covariates included in the revised juvenile summer RF extrapolation models

Partial dependence plots for covariates included in the revised juvenile summer RF extrapolation models

Figure 4.8: Partial dependence plots for covariates included in the revised juvenile summer RF extrapolation models

Partial dependence plots for covariates included in the revised juvenile winter RF extrapolation models

Figure 4.9: Partial dependence plots for covariates included in the revised juvenile winter RF extrapolation models

Partial dependence plots for covariates included in the revised juvenile winter RF extrapolation models

Figure 4.10: Partial dependence plots for covariates included in the revised juvenile winter RF extrapolation models

Partial dependence plots for covariates included in the revised redds RF extrapolation models

Figure 4.11: Partial dependence plots for covariates included in the revised redds RF extrapolation models

Partial dependence plots for covariates included in the revised redds RF extrapolation models

Figure 4.12: Partial dependence plots for covariates included in the revised redds RF extrapolation models